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Abstract 

We show that the stick-breaking construction of 
the beta process due to Paisley et al. ( 2010[ ) can 
be obtained from the characterization of the beta 
process as a Poisson process. Specifically, we 
show that the mean measure of the underlying 
Poisson process is equal to that of the beta pro- 
cess. We use this underlying representation to 
derive error bounds on truncated beta processes 
that are tighter than those in the literature. We 
also develop a new MCMC inference algorithm 
for beta processes, based in part on our new Pois- 
son process construction. 



1 Introduction 



The beta process is a Bayesian nonparametric prior for 
sparse collections of binary features (Thi baux & Jordan] 
[2007 ). When the beta process is marginalized out, one ob- 



tains the Indian buffet process (IBP) (Griffiths & Ghahra- 



mani 



2006 1. Many applications of this circle of ideas — 



including focused topic distributions (Williamson et al. 



2010i) , featural representations of multiple time series ( Fox| 



et al 



|2010[ l and dictionary learning for image process- 
-are motivated from the IBP rep- 



ing (Zhou ef a/. 20 1 1 



resentation. However, as in the case of the Dirichlet pro- 
cess, where the Chinese restaurant process provides the 
marginalized representation, it can be useful to develop in- 
ference methods that use the underlying beta process. A 
step in this direction was provided by |Teh et'oT. ( 2007 1, who 
derived a stick-breaking construction for the special case of 
the beta process that marginalizes to the one-parameter IBP. 

Recently, a stick-breaking construction of the full beta pro- 
cess was derived by Paisley et al. (2010). The derivation re- 
lied on a limiting process involving finite matrices, similar 
to the limiting process used to derive the IBP. However, the 
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beta process also has an underlying Poisson process Por-| 
|dan||20 1 0{ [Thibaux & Jordan] |2007| l, with a mean measure 
V (as discussed in detail in SectionjOJ. Therefore, the pro- 
cess presented in Paisley et al. (2010) must also be a Pois- 
son process with this same mean measure. Showing this 
equivalence would provide a direct proof of [Paisley et al. 
(2 010| usi ng the well-studied Poisson process machinery 



( |Kingman|[T993] ). 



In this paper we present such a derivation (Section [3]l. In 
addition, we derive error truncation bounds that are tighter 
than those in the literature (Section |4T| ( |Doshi-Velez et al^ 



2009 Paisley et al. 201 1 1. The Poisson process framework 



also provides an immediate proof of the extension of the 
construction to beta processes with a varying concentration 



parameter and infinite base measure (Section 4.2 1, which 
does not follow immediately from the derivation in |Pais-[ 
ley et al.\{2Q\Q) . In Section [5] we present a new MCMC 



algorithm for stick-breaking beta processes that uses the 
Poisson process to yield a more efficient sampler than that 
presented in Paisley et a/.[ ( 20I0| l. 



2 The Beta Process 

In this section, we review the beta process and its marginal- 
ized representation. We discuss the link between the beta 
process and the Poisson process, defining the underlying 
Levy measure of the beta process. We then review the 
stick-breaking construction of the beta process, and give 
an equivalent representation of the generative process that 
will help us derive its Levy measure. 

A draw from a beta process is (with probability one) a 
countably infinite collection of weighted atoms in a space 



Q., with weights that lie in the interval [0, 1] (Hjort 1990 1. 
Two parameters govern the distribution on these weights, a 
concentration parameter a > and a finite base measure 
/i, with = 7^ Since such a draw is an atomic mea- 
sure, we can write it as = ^ 



where the two 



index values follow from Paisley et al. ( 2010 1, and we write 

H - BP(a,/i). 



'in Section [42[ we discuss a generalization of this definition 
that is more in line with the definition given by |Hjort 1 1990 1. 
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Figure 1: (left) A Poisson process 11 on [a, b] x [0, 1] with mean measure ly = x X, where X{dTr) — 



^(1 -7r)"-id7r 



and /i([a, b]) < oo. The set A contains a Poisson distributed number of atoms with parameter ii{d9)X{dTT). (right) The 
beta process constructed from 11. The first dimension corresponds to location, and the second dimension to weight. 



Contrary to the Dirichlet process, which provides a prob- 
ability measure, the total measure H{il) ^ 1 with proba- 
bility one. Instead, beta processes are useful as parameters 
for a Bernoulli process. We write the Bernoulli process X 



In this example, a Poisson process generates points in the 
space [a, 6] x [0, 1]. It is completely characterized by its 



where Zij ^ Bernoulli (TTy ), and de- 



as X = X^ij ^u'^ffi, 

note this as X - BcP(i?). [TTiibaux & Jordan (2007) show 
that marginalizing over H yields the Indian buffet process 
(IBP) of |Griffiths & GhahramanilpOOel l. 

The IBP clearly shows the featural clustering property of 
the beta process, and is specified as follows: To generate a 
sample X„_|_i from an IBP conditioned on the previous n 
samples, draw 



^n+l|^l:r 



BeP 



1 



a 



This says that, for each Qij with at least one value of 
Xm{&ij) equal to one, the value of Xn+i{dij) is equal to 



mean measure, i/(c?6', cZtt) (Kingman 1993; Cinlar 201 li. 
For any subset A C [a, b] x [0, 1], the random counting 
measure N{ A) equals the number of points from 11 con- 
tained in A. The distribution of N{A) is Poisson with 
parameter iy(A). Moreover, for all pairwise disjoint sets 
Ai, . . . , An, the random variables N{Ai), . . . , N{An) are 
independent, and therefore is completely random. 

In the case of the beta process, the mean measure of the 
underlying Poisson process is 

v{d0, dn) = an-'^il ~ T:)"-'^dnfi{de). (1) 

We refer to A(c?7r) = Q!7r~^(l — Tr)°'^^dTr as the Levy mea- 
sure of the process, and fj, as its base measure. Our goal in 
Section|3]will be to show that the following construction is 
also a Poisson process with mean measure equal to ([T]i, and 
is thus a beta process. 



one with probability J2m^rn{0ij)- After sampUng 
these locations, a Poisson(a/i(f2)/(a + n)) distributed 

number of new locations 9i'j' are introduced with corre- ^-^ Stick-breaking for the beta process 

sponding Xn+i{Oi'j') set equal to one. From this repre- 
sentation one can show that X,n{n) has a Poisson(/i(r2)) 
distribution, and the number of unique observed atoms 
in the process Xi-n is Poisso n distributed with param eter 
Em=i an{n)/{a + m - 1) ([Thibaux & Jordan||2007 1. 



Paisley et al. (2010 1 presented a method for explicitly 
constructing beta processes based on the notion of stick- 
breaking, a general method for obtaining discrete proba- 



bility measures ( Ishwaran <& James 2001 1. Stick-breaking 



2.1 The beta process as a Poisson process 

An informative perspective of the beta process is as a 
completely random measure, a construction based on the 



plays an important role in Bayesian nonparametrics, thanks 
largely to a seminal derivation of a stick-breaking represen- 



tation for the Dirichlet process by Sethuraman ( 1994 1. In 
the case of the beta process. Paisley et al. ( 2010[ l presented 
the following representation: 



Poisson process (Jordan 2010 1. We illustrate this in Fig- 
ure [T] using an example where = [a, 6] and — 
53^Leb(v4), with Lcb(-) the Lebesgue measure. The right 
figure shows a draw from the beta process. The left figure 
shows the underlying Poisson process, 11 = {(6', tt)}. 



^-EE^?n(i-^?)^«.' (2) 
i=i 1=1 

d *~ Poisson(7), V^^'' *~ Beta(l, a), 9ij *~ -/x, 
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Figure 2: An illustration of the stick-breaking construction of the beta process by round index i for i < 5. Given a space 
51 with measure ^, for each index i a Poisson(/i(ri)) distributed number of atoms 6 are drawn i.i.d. from the probability 
measure fi/ii{il). To atom 6ij, a corresponding weight iTij is attached that is the ith break drawn independently from a 
Beta(l, a) stick-breaking process. A beta process is H = ^ - HijSg. - . 



where, as previously mentioned, a > and /i is a non- 
atomic finite base measure with /i(fi) — 7. 

This construction sequentially incorporates into H a 
Poisson-distributed number of atoms drawn i.i.d. from /j,/7, 
with each round in this sequence indexed by i. The atoms 
receive weights in [0, 1], drawn independently according to 
a stick-breaking construction — an atom in round i throws 
away the first i — 1 breaks from its stick, and keeps the ith 
break as its weight. We illustrate this in Figure [2] 

We use an equivalent definition of H that reduces the to- 
tal number of random variables by reducing the product 
nj<i(l~^) ^ function of a single random variable. Let 



Vi be i.i.d. Beta(l, a) and let f{Vl■,^-l 



n 



i-v; 



If T ~ Gamma(i - 1,q;), then /(Vi:i_i) =d cxp{-r}. 
The construction in (j2]i is therefore equivalent to 



00 Ci 



i=2 j=l 



Ci Poisson(7), Vij Beta(l,Q;) 



Tij"'^ Gamma(i — 1, a), 6. 



d 1 



(3) 



Starting from a finite approximation of the beta process. 
Paisley et al. ( 2010 1 showed that (j2]i must be a beta process 
by making use of the stick-breaking construction of a beta 
distribution ( |Sethuraman| |1994| l, and then finding the lim- 
iting case; a similar limiting-case derivation was given for 
the Indian buffet process (Griffiths & Ghahramani 2006[ l. 
We next show that (|2| can be derived directly from the char- 
acterization of the beta process as a Poisson process. This 
verifies the construction, and also leads to new properties 
of the beta process. 

3 Stick-breaking from the Poisson Process 

We now prove that Q is a beta process with parameter 
a > and base measure ^ by showing that it has an un- 



derlying Poisson process with mean measure (miF] We first 



state two basic lemmas regarding Poisson processes ( jKing- 



|man| |1993| l. We then use these lemmas to show that the 
construction of iJ in ([3]l has an underlying Poisson process 
representation, followed by the proof. 

3.1 Representing as a Poisson process 

The first lemma concerns the marking of points in a Poisson 
process with i.i.d. random variables. The second lemma 
concerns the superposition of independent Poisson pro- 
cesses. Theorem 1 uses these two lemmas to show that 
the construction in ([3]l has an underlying Poisson process. 

Lemma 1 (marked Poisson process) Let 11* be a Pois- 
son process on 57 with mean measure /i. With each 9 £W 
associate a random variable tt drawn independently with 
probability measure A on [0, 1]. Then the set II ^ {{9, tt)} 
is a Poisson process on fix [0,1] with mean measure fix X. 

Lemma 2 (superposition property) Lef IIi , 112 , . . . be a 

countable collection of independent Poisson processes on 
n X [0, 1]. Let Hi have mean measure Vi. Then the su- 
perposition n = Ui^i is a Poisson process with mean 



measure v 



Theorem 1 The construction of H given in Q has an 
underlying Poisson process. 

Proof. This is an application of Lemmas 1 and 2; in 
this proof we fix some notation for what follows. Let 

TTij := Vij and iiij := Vij exp{— Ty } for i > 1. Let Hi := 

Y^'j^i'^ij^Oij and therefore iJ — YlTLi^i- Noting that 
Ci ^ Poisson(/x(51)), for each Hi the set of atoms 



Broderick 



A similar result has recently been presented by 

\et al. \ ( [2012[ >; however, their approach differs from ours m its 
mathematical underpinnings. Specifically we use a decomposi- 
tion of the beta process into a countably infinite collection of 
Poisson processes, which leads directly to the applications that 
we pursue in subsequent sections. By contrast, the proof in |Brod-| 
"erick et al. (2012) does not take this route, and their focus is on 
power-law generalizations of the beta process. 
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forms a Poisson process H* on fl with mean measure ^. 
Each 9ij is marked with a TTjj e [0, 1] that has some prob- 
ability measure (to be defined later). By Lemma 1, each 
Hi has an underlying Poisson process = {{9ij,Trij)}, 
on 51 X [0, 1] with mean measure ^ x A^. It follows that H 
has an underlying 11 = IJi^i ^i, which is a superposition 
of a countable collection of independent Poisson processes, 
and is therefore a Poisson process by Lemma 2. □ 

3.2 Calculating the mean measure of H 

We've shown that H has an underlying Poisson process; it 
remains to calculate its mean measure. We define the mean 
measure of 11; to be ~ /i x Ai, and by Lemma 2 the mean 
measure of 11 is ~ Si^i — ^ J^iLi ^i- ^^^^ 
show that i'{d6,dTr) — a7r^^(l — n)'^~^d'Kii{d9), which 
will establish the result stated in the following theorem. 

Theorem 2 The construction defined in Q is of a beta 
process with parameter a > and finite base measure fi. 

Proof. To show that the mean measure of 11 is equal to ([T]l, 
we first calculate each i^i and then take their summation. 
We split this calculation into two groups. Hi and 11; for 
i > 1, since the distribution of iiij (as defined in the proof 
of Theorem 1) requires different calculations for these two 
groups. We use the definition of _ff in (j3]l to calculate these 
distributions of iTij for « > 1. 

Case i — 1. The first round of atoms and their correspond- 
ing weights. Hi = '^ij^Oij with nij :— Vij, has an 

underlying Poisson process Hi — {{9ij,TTij)} with mean 
measure t^i = /i x Ai (Lemma 1). It follows that 



variables (IRohatgil 1 197611, the density of nij — VijWij is 



(4) 



We write Ai(d7r) ~ fi{TT\a)d7T. For example, the density 
above is /i — a{l — tt)"^^. We next focus on calculating 
the density fi for i > 1. 

Case i > 1. Each Hi has an underlying Poisson process 
Ili = {(%: T^ij)} with mean measure /i x A.;, where A^ de- 
termines the probability distribution of iTij (Lemma 1). As 
with i = 1, we write this measure as Xi{dTT) = fi{Tr\a)dTr, 
where /,;(7r|a) is the density of iTij, i.e., of the ith break 
from a Beta(l,Q;) stick-breaking process. This density 
plays a significant role in the truncation bounds and MCMC 
sampler derived in the following sections; we next focus on 
its derivation. 

Recall that iTij :— Vij exp{— Ty }, where Vij ^ Beta(l, a) 
andTij ^ Gamma(i— 1, a). First, let :— exp{—Tij}. 
Then by a change of variables. 



pw[w\i,a) 



(z-2) 



w 



w ^pv{Tr/w\a)pw{w\i,a)dw 



(5) 



(*-2)!^ 



w''-^iln-y-^{l--r-'dw. 



w 



w 



Though this integral does not have a closed-form solution 
for a single Levy measure A;, we show next that the sum 
over these measures does have a closed-form solution. 

The Levy measure of H. Using the values of fi derived 
above, we can calculate the mean measure of the Poisson 
process underlying Q. As discussed, the measure ly can be 
decomposed as follows, 

oo oo 

^{dO, dn) — X Xi){d9, dn) = ^[d9)dTT'S^ fi{Ti\a.). 



i=l 



By showing that YlTLi /i('''|Q^) = an^^{\ — tt)"^^, we 
complete the proof; we refer to the appendix for the details 
of this calculation. 

4 Some Properties of the Beta Process 

We have shown that the stick-breaking construction defined 
in Q has an underlying Poisson process with mean mea- 
sure i'{d9,dTr) = aTT^^{l — Tr)°'^^dTrfi{d9), and is there- 
fore a beta process. Representing the stick-breaking con- 
struction as a superposition of a countably infinite collec- 
tion of independent Poisson processes is also useful for fur- 
ther characterizing the beta process. For example, we can 
use this representation to analyze truncation properties. We 
can also easily extend the construction in Q to cases such 
as that considered in |Hjort| ( |l990| l, where a is a function of 
9 and /i is an infinite measure. 

4.1 Truncated beta processes 

Truncated beta processes arise in the variational inference 



setting (Doshi-Velez et al. 2009 Paisley et al. 2011 Jor- 



dan et al. 1999 1. Poisson process representations are use- 
ful for characterizing the part of the beta process that is be- 
ing thrown away in the truncation. Consider a beta process 
truncated after round R, defined as H'^^^ ~ X)iLi Hi. The 
part being discarded, H — H^^'' , has an underlying Poisson 
process with mean measure 



iy^{d9,dTr) := J2'^r+i ^ dn) 

= H{d9) X Y.°^j^^^ X,{dn) 



(6) 



Using the product distribution formula for two random 



and a corresponding counting measure N^{d9,dn). This 
measure contains information about the missing atoms|^ 

'For example, the number of missing atoms having weight 
TT > e is Poisson distributed with parameter i^^iO,, [e, 1]). 
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Figure 3: Examples of the error bound, (left) The bounds for a = 3, 7 = 4 and M — 500. The previous bound appears in 
Paisley et al. ( 2011| l. (center and right) Contour plots of the Li distance between the Theorem 3 bound and the Corollary 
bound, presented as functions of a and 7 for (center) M — 100, (right) M = 500. The Li distance for the left plot is 0.46. 
The Corollary bound becomes tighter as a and 7 increase, and as M decreases. 



For truncated beta processes, a measure of closeness to the 
true beta process is helpful when selecting truncation lev- 
els. To this end, let data ^ f{Xn, (/>„), where Xn is a 
Bernoulli process taking either H or H^^^ as parameters, 
and (pn is a set of additional parameters (which could be 
globally shared). Let Y ~ {Yi, . . . , Ym)- One measure of 
closeness is the Li distance between the marginal density 
of Y under the beta process, m.00 (Y), and the process trun- 
cated at round R, m.ii{Y). This measure originated with 
work on truncated Dirichlet processes in Ishwaran & James | 
(2000 2001); in |Doshi-Velez et a/.| ( |2009| l, it was extended 
to the beta process. 

After slight modification to account for truncating rounds 
rather than atoms, the result in |Doshi-Velez et aL\ ( |2009| ) 
implies that 



|mfl(Y)-m^(Y)|dY 



(7) 



< P(3(z, >R,l<n<M: X„(%) = 1) , 



with a similar proof as in Ishwaran & James ( 2000 1. This 



says that 1/4 times the Li distance between m/j and 
is less than one minus the probability that, in M Bernoulli 
processes with parameter H ^ BP(a,/i), there is no 
Xn{9) ^ 1 for a 6 e Hi with i > R. In|Doshi-Velez et al. 



( |2009| l and Paisley et al. ( 201 l| l, this bound was loosened. 
Using the Poisson process representation of H, we can give 
an exact form of this bound. To do so, we use the follow- 
ing lemma, which is similar to Lemma 1 , but accounts for 
markings that are not independent of the atom. 



Lemma 3 Let (6, n) form a Poisson process on n x [0, 1] 
with mean measure i'^. Mark each (6, n) with a random 
variable U in a finite space S with transition probability 
kernel Q{tt, •). Then {6, n, U) forms a Poisson process on 
X [0, 1] X 5 with mean measure iy^{d9, d7r)Q(7r, U). 

This leads to Theorem 3. 



Theorem 3 Let X^m BeP(i7) with iJ - BP(a, ^) 
constructed as in For a truncation value R, let E be 
the event that there exists an index with i > R such 
that Xn{6ij) = 1. Then the bound in (|7| equals 



V{E) = 1 - cxp i - / 



vl{n,d^){i-{i-i,f') 



Proof Let U G {0, 1}*^. By Lemma 3, the set {{9, tt, ;7)} 
constructed from rounds R + 1 and higher is a Pois- 
son process on 17 x [0, 1] x {0, 1}^^ with mean measure 
{d9, dTT)Q{n, U) and a corresponding counting measure 
N^{d9,dTT,U), where Q{tt,-) is a transition probability 
measure on the space {0,1}*-'^. Let A = {0, l}^-'^\0, 
where is the zero vector. Then Q{Tr, A) is the proba- 
bility of this set with respect to a Bernoulli process with 
parameter tt, and therefore Q{Tr,A) ~ 1 — (1 — vr)^^. 
The probability P{E) = 1 - P(£'^), which is equal to 
1 - V{N+{n, [0, l^A) = 0). The theorem follows since 
N^{n, [0, 1],^) is a Poisson-distributed random variable 
with parameter /^^ (^^^(fi, d7r)(3(7r, □ 

Using the Poisson process, we can give an analytical bound 



that is tighter than that in Paisley et al. (201 1 1. 



Corollary 1 Given the set-up in Theorem 3, an upper 
bound on V{E) is 



¥{E) < 1 - exp -7M 



a 



l + a 



Proof. We give the proof in the appendix. 



We give a second proof using simple functions in the ap- 
pendix. One can use approximating simple functions to give an 
arbitrarily close approximation of Theorem 3. Furthermore, since 
^ti ~ ^R-i ~ ^0 ~ ^' performing a sweep of trun- 

cation values requires approximating only one additional integral 
for each increment of R. 
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The bound in [Paisley et al.\\2Q\\) has 2M rather than M. 
We observe that the term in the exponential equals the neg- 
ative of M 7rj/^(51, dir), which is the expected number 
of missing ones in M truncated Bernoulli process observa- 
tions. Figure [3] shows an example of these bounds. 

4.2 Beta processes with infinite ^ and varying a 

The Poisson process allows for the construction to be ex- 
tended to the more general definition of the beta process 
given by'Hjort ( 19901. In this definition, the value of a{9) 



is a function of 6, rather than a constant, and the base mea- 
sure /i may be infinite, but cr-finitej^ Using Poisson pro- 
cesses, the extension of (|2| to this setting is straightforward. 
We note that this is not immediate from the limiting case 
derivation presented in Paisley et al.\\20\Q) . 

For a partition {Ek) of 57 with ^{Ek) < 00, we treat each 
set Ek as a separate Poisson process with mean measure 



lyEk {dO, dn) 



fi{d9)X{d,dTr), 
a(e')7r"i(l - 71 



OeEk 



The transition probability kernel A follows from the con- 
tinuous version of Lemma 3. By superposition, we have 
the overall beta process. Modifying (j2]i gives the following 
construction: For each set Ek construct a separate He^. - In 
each round of (j2|l, incorporate Poissoii(^(i<^/j)) new atoms 
6*^-*^' € Ek drawn i.i.d. from ^/^{Ek). For atom olj \ draw 

a weight 7r|j^'' using the ith break from a Beta(l, a(0|*^'')) 
stick-breaking process. The complete beta process is the 
union of these local beta processes. 



5 MCMC Inference 



5.1 A distribution on observed atoms 

Before presenting the MCMC sampler, we derive a quantity 
that we use in the algorithm. Specifically, for the collection 
of Poisson processes Hi, we calculate the distribution on 
the number of atoms 9 £ Hi for which the Bernoulli pro- 
cess Xn{9) is equal to one for some 1 < n < M. In this 
case, we denote the atom as being "observed." This dis- 
tribution is relevant to inference, since in practice we care 
most about samples at these locations. 

The distribution of this quantity is related to Theorem 3. 
There, the exponential term gives the probability that this 
number is zero for all i > R. More generally, under the 
prior on a single Hi, the number of observed atoms is Pois- 
son distributed with parameter 



6 



dTT){l - {I - TT)^^)dTT. 



(8) 



The sum J^'iLi 'Ci < 00 for finite M, meaning a finite num- 
ber of atoms will be observed with probability one. 

Conditioning on there being T observed atoms overall, 
01. rp, we can calculate a distribution on the Poisson pro- 
cess to which atom 91 belongs. This is an instance of Pois- 
sonization of the multinomial; since for each Hi the distri- 
bution on the number of observed atoms is independent and 
Poisson(^,;) distributed, conditioning on T the Poisson pro- 
cess to which atom 9'^ belongs is independent of all other 
atoms, and identically distributed with P{91 E Hi) cx £,i. 

5.2 The sampling algorithm 

We next present the MCMC sampling algorithm. We index 
samples by an s, and define all densities to be zero outside 
of their support. 



We derive a new MCMC inference algorithm for beta pro- 
cesses that incorporates ideas from the stick-breaking con- 
struction and Poisson process. In the algorithm, we re- 
index atoms to take one index value k, and let dk indicate 
the Poisson process of the fcth atom under consideration 
(i.e., 9k E Hd^)- For calculation of the likelihood, given 
M Bernoulli process draws, we denote the sufficient statis- 
tics mi^k = Y^iLi Xn{9k) and TOo,fe = M - mi^k- 

We use the densities /i and fi, i > 1, derived in Q and 
(|5]l above. Since the numerical integration in (|5]) is com- 
putationally expensive, we sample w as an auxiliary vari- 
able. The joint density of iTij and Wij, < vTy < w^, for 
9ij E Hi and i > 1 is 

fiin^j^Wijla) oc w~i^{-\iiwijy'^{wij -TTij)"^^. 

The density for i = 1 does not depend on w. 



^That is, the total measure fJ,{0,) — 00, but there is a measur- 
able partition (Ek) of f2 with each f^{Ek) < 00. 



Sample TTfc. We take several random walk Metropolis- 
Hastings steps for TTfe. Let tt^ be the value at step s. Let 



ad 



the proposal be ttJ = 7r| + where ~ N{0, al). Set 
h probability 

p(mi,fc,mo,fc|7r^)/d|(7r^K,a^)' 



TT^"*"^ = vr^ with probability 



min < 1 



p(TOi,fe, mo,fc|7r^)/<i|(7r||Wfe, a^) J ' 



otherwise set vr^.^^ = 7r|. The likelihood and priors are 

" if dk > 1 

(1 - Tt)"- ' 11 Uk 



/•/IS \ IV'-*-'!- " / 



Sample Wk- We take several random walk Metropolis- 
Hastings steps for Wk when > 1. Let wf. be the value 

at step s. Set the proposal w^, — wf. + where C| 
7V(0, crj), and set 



iid 



W 



s+1 



Wk w.p. mm 
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otherwise set w'^^^ = w^. The value of / is 



When df, = 1, the auxiliary variable Wk does not exist, so 



we don't sample it. If 



1, but dl > 1, we sample 



^ Uniform(7r^, 1) and take many random walk M-H 
steps as detailed above. 



Sample new atoms. We sample new atoms in addition to 
the observed atoms. For each i — 1, . . . ,max{di;Tj, we 
"complete" the round by sampling the unobserved atoms. 
For Poisson process Hi, this number has a Poisson(7s — ) 
distribution. We can sample additional Poisson processes 
as well according to this distribution. In all cases, the new 
atoms are i.i.d. ii/js- 



Sample dfc. We follow the discussion in Section [5?T] to 5.3 Experimental results 

sample d^^^. Conditioned on there being Tg observed 
atoms at step s, the prior on dp'^ is independent of all other 
indicators d, and ¥{d1^^ = i) cx where S,l is given in 
(jSjl. The likelihood depends on the current value of d%. 

Case df. > 1. The likelihood /(tt^, w^jd^.^^ = i,as) is 
proportional to 



if i = 1 



Case dl — 1. In this case we must account for the possi- 
bility that 7r| may be greater than the most recent value of 
Wk, we marginalize the auxiliary variable w numerically, 
and compute the likelihood as follows: 

%j, J^. w^^i-lnwy-^iw - TT^'^^-'^dw ifi > 1 



(i- 



if i = 1 



A slice sampler ( Neal 2003 1 can be used to sample from 
this infinite-dimensional discrete distribution. 

Sample a. We have the option of Gibbs sampling a. For 
a Gamma(ri, T2) prior on a, the full conditional of a is a 
gamma distribution with parameters 



t'i,s = Ti + T.kdk, 



As =T2- 'Ek ln(i 



In this case we set wf, = 1 if d^. = 1. 



Sample 7. We also have the option of Gibbs sampling 7 
using a Gamma(Ki, K2) prior on 7. As discussed in Section 
|5.1| let Tg be the number of observed atoms in the model 
at step s. The full conditional of 7 is a gamma distribution 
with parameters 



«i.s = Ki+Ts 



K2 



EM-1 a 
n—0 a^- 



This distribution results from the Poisson process, and the 
fact that the observed and unobserved atoms form a dis- 
joint set, and therefore can be treated as independent Pois- 
son processes. In deriving this update, we use the equality 
E^i Ci/ls ^ J2nJo^ STfTi' found by inserting the mean 
measure ([T]i into ([8]|. 

Sample X. For sampling the Bernoulli process X, we 
have that p{X\V,H) cx p{V\X)p{X\H). The likelihood 
of data T) is independent of H given X and is model- 
specific, while the prior on X only depends on n. 



We evaluate the MCMC sampler on synthetic data. We use 
the beta-Bernoulli process as a matrix factorization prior 
for a hnear-Gaussian model. We generate a data matrix 
Y = Q{W o Z) + e with each Wkn N{0, 1), the bi- 
nary matrix Z has Pr(Zfc„ = = tt^ and the columns 
of Q are vectorized 4x4 patches of various patterns (see 
Figure [4|. To generate H for generating Z, we let tt^ be 
the expected value of the fcth atom under the stick-breaking 
construction with parameters a = 1, 7 = 2. We place 
Gamma(l, 1) priors on a and 7 for inference. We sampled 
M — 500 observations, which used a total of 20 factors. 
Therefore Y e Ri6x500 ^ e {0, l}20x500_ 

We ran our MCMC sampler for 10,000 iterations, collect- 
ing samples every 25th iteration after a burn-in of 2000 iter- 
ations. For sampling tt and w, we took 1,000 random walk 
steps using a Gaussian with variance lO^'^. Inference was 
relatively fast; sampling all beta process related variables 
required roughly two seconds per iteration, which is signif- 
icantly faster than the per-iteration average of 14 seconds 
for the algorithm presented in [Paisley et al. (2010 1, where 
Monte Carlo integration was heavily used. 

We show results in Figure |4] While we expected to learn 
a 7 around two, and a around one, we note that our algo- 
rithm is inaccurate for these values. We believe that this is 
largely due to our prior on dk (Section 5.1 1. The value of 



dk significantly impacts the value of a, and conditioning on 
"E^i^i Xn{9) > gives a prior for dk that is spread widely 
across the rounds and allows for much variation. A possi- 
ble fix for this would be conditioning on the exact value of 
the number of atoms in a round. This will effectively give a 
unique prior for each atom, and would require significantly 
more numerical integrations leading to a slower algorithm. 

Despite the inaccuracy in learning 7 and a, the algorithm 
still found to the correct number of factors (initialized at 
100), and found the correct underlying sparse structure of 
the data. This indicates that our MCMC sampler is able to 
perform the main task of finding a good sparse representa- 
tion]^ It appeared that the likelihood of vr dominates infer- 
ence for this value, since we observed that these samples 
tended to "shadow" the empirical distribution of Z. 



The variable 7 only enters the algorithm when sampling new 
atoms. Since we learn the correct number of factors, this indicates 
that our algorithm is not sensitive to 7. Fixing the concentration 
parameter a is an option, and is often done for Dirichlet processes. 



Stick-Breaking Beta Processes and the Poisson Process 
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Figure 4: Results on synthetic data, (left) The top 16 underlying factor loadings for MCMC iteration 10,000. The ground 
truth patterns are uncovered, (middle) A histogram of the number of factors. The empirical distribution centers on the 
truth, (right) The kernel smoothed density using the samples of a and 7 (see the text for discussion). 



6 Conclusion 



We have used the Poisson processes to prove that the stick- 



breaking construction presented by Paisley et al. (2010 1 is 
a beta process. We then presented several consequences 
of this representation, including truncation bounds, a more 
general definition of the construction, and a new MCMC 
sampler for stick-breaking beta processes. Poisson pro- 
cesses offer flexible representations of Bayesian nonpara- 
metric priors; for example, Lin et al. ( 2010| l show how 
they can be used as a general representation of dependent 
Dirichlet processes. Representing a beta process as a su- 
perposition of a countable collection of Poisson processes 
may lead to similar generalizations. 

Appendix 



Alternate proof of Theorem 3 Let the set Bnk = 

[^^' n) ~ ^^T^' where n and fc < n are posi- 

tive integers. Approximate the variable vr e [0, 1] with the 
simple function gni"^) = f'nk'i-B^k ("')■ calculate 

the truncation error term, P(£"=) = E[rij>7^ j(l - T^ij)^'^], 
by approximating with gn, re-framing the problem as a 
Poisson process with mean and counting measures and 
N^{fl, B), and then taking a limit: 



E 



n 



i>R,j 



M 



n 

lim r\E\il-b„,r-<^'''^-''^ 

n— >-oo L 



k^2 



(10) 



exp i lim - vt{il, Bnk) (l - (1 - bnk) 



k=2 



Proof of Theorem 2 (conclusion) From the text, we 

have that A(d7r) — fi{Tr\a)dTr + J2iL2 fiiM'^)'^''^ with 
/i(7r|a) = a{l — tt)"^^ and fi given in Equation [s] for 
i > 1. The sum of densities is 



For a fixed n, this approach divides the interval [0, 1] into 
disjoint regions that can be analyzed separately as indepen- 
dent Poisson processes. Each region uses the approxima- 
tion TT « (?n(7r), with lim„_>.oo <?n(7r) = TT, and N^{n, B) 
counts the number of atoms with weights that fall in the in- 
terval B. 

the expectation follows. 



Since is Poisson distributed with mean i^^. 



'j"-^{lii-)'-\l--)''-^dw 
w w 



^ 7 — 2 1 



a 



w ^(1 — tt/w) 



Q-l 



dw . 



(9) 



The second equality is by monotone convergence and Fu- 
bini's theorem. This leads to an exponential power se- 
ries, which simphfies to the third line. The last line equals 

gives 



Adding flie result of ^ to ail ~ tt)""! 



^^-^ /i(7r|a) = aTT ^(1— tt)" . Therefore, (^(d^, dTr) = 
aTT~^{l — ■K)°'~^d-K^{d9), and the proof is complete. □ 



Proof of Corollary 1 From the alternate proof of Theo- 
rem 3 above, we have P(i;) = i-E[ni>_R jii-^u)*"^] < 

1 ^ E[ni>/f j(l ~ ""ij)]*^- This second expectation can 
be calculated as in Theorem 3 with M replaced by a one. 
The resulting integral is analytic. Let g,. be the distribution 
of the rth break from a Beta(l, a) stick-breaking process. 
The negative of the term in the exponential of Theorem 3 is 



„i 00 



(11) 



Since Eg,, [tt] 



' (i+«) ' 



(11 



equals 7 



l+a , 
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